A Framework for Kernel Regularization with Application to Protein Clustering
نویسندگان
چکیده
We develop and apply a novel framework which is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides a set of coordinates in Euclidean space which attempt to respect the information available, while controlling for complexity of the kernel. The resulting set of coordinates are highly appropriate for visualization and as input to classification and clustering algorithms. The framework is formulated in terms of a class of optimization problems which can be solved efficiently using modern convex cone programming software. The power of the method is illustrated in the context of protein clustering based on primary sequence data. An application to the globin family of proteins resulted in a readily visualizable 3D sequence space of globins, where several sub-families and sub-groupings consistent with the literature were easily identifiable.
منابع مشابه
Framework for kernel regularization with application to protein clustering.
We develop and apply a previously undescribed framework that is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides a set of co...
متن کاملMultiple Kernel Clustering Framework with Improved Kernels
Multiple kernel clustering (MKC) algorithms have been successfully applied into various applications. However, these successes are largely dependent on the quality of pre-defined base kernels, which cannot be guaranteed in practical applications. This may adversely affect the clustering performance. To address this issue, we propose a simple while effective framework to adaptively improve the q...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملEncoding Dissimilarity Data for Statistical Model Building.
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding n...
متن کاملKernel Cuts: MRF meets Kernel&Spectral Clustering
The log-likelihood energy term in popular model-fitting segmentation methods, e.g. [64, 14, 50, 20], is presented as a generalized “probabilistic” K-means energy [33] for color space clustering. This interpretation reveals some limitations, e.g. over-fitting. We propose an alternative approach to color clustering using kernel K-means energy with well-known properties such as non-linear separati...
متن کامل